CS1.503 Mathematical Foundations of Data Science (Monsoon 2023)


Announcements

  • Instructors: Girish Varma, Suryajith Chillara
  • Teaching assistants: Nithish Raja
  • Schedule: Lectures: Tuesday and Friday, 14:00 – 15:25, Tutorial: Saturday, 12:40 - 13:40
  • Classroom: Digital classroom, B5 Vindhya
  • Course portal: elearn @ IIIT Hyderabad

Tentative topics

  1. Estimation from Random Samples

    • Examples in Vote Share surveys, Medical Tests etc.
    • Mathematical Formulation in terms of Random Variables
    • Linearity of Expectation & Tail Bounds
    • Gaussian & Confidence Intervals
    • Predictions, Precision/Recall Curve
  2. WWW Graph, Page Rank & Eigenvalues

    • World Wide Web Graph and Ranking Problem
    • Random Walks and Eigenvalues
    • Stationery Distributions and Degree
    • Convergence and Second Largest Eigen Value
  3. Dimensionality Reduction

    • Dimension Reduction Problem
    • Examples: Axis of a Spiral Galaxy, Recommender Systems
    • Singular Value Decomposition
    • Best fit subspaces from SVD
    • Low Rank Assumption and Applications
    • Projection to Random Subspace (Johnson-Lindenstrauss)
  4. Data Streaming Algorithms

    • Finding distinct elements, missing numbers and duplicates
    • Streaming algorithms
    • Fingerprinting Method
    • Frequency Moments and k-wise Independence
    • Limits of Streaming Algorithms
  5. Nearest Neighbor Search, Hashing and Clustering

    • Nearest Neighbor Classifier
    • Hashing
    • Appropriate NN from Locally Sensitive Hashing
    • Clustering
  6. Sublinear time algorithms

    • Property testing
    • Sublinear time algorithms for graphs
    • Sublinear time algorithms for boolean functions
    • Distribution testing
  7. Supervised Learning

    • PAC learning
    • Infinite hypothesis classes and VC dimension
    • Learning linear functions using gradient updates

References

  1. Avrim Blum, John Hopcroft, and Ravindran Kannan (2018), Foundations of Data Science, Free draft on web.
  2. Martin J. Wainwright and Michael I. Jordan (2018), Graphical Models, Exponential Families, and Variational Inference, Now publishers.
  3. S. Muthukrishnan (2004), Data Streams: Algorithms and Applications, survey.
  4. Moran Feldman (2020), Algorithms for Big Data, World Scientific.
  5. Shai Shalev-Schwartz and Shai Ben David (2014), Understanding Machine Learning: From theory to Algorithms, Cambridge University Press.